hyunseoki
/

ReMoDetect-deberta

Model card Files Files and versions

hyunseoki commited on Sep 26, 2024

Commit

9421869

·

verified ·

1 Parent(s): d3545ce

Create README.md

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+language:
+- en
+base_model:
+- OpenAssistant/reward-model-deberta-v3-large-v2
+---
+## ReMoDetect: Robust Detection of Large Language Model Generated Texts Using Reward Model
+ReMoDetect addresses the growing risks of large language model (LLM) usage, such as generating fake news, by improving detection of LLM-generated text (LGT). Unlike detecting individual models, ReMoDetect identifies common traits among LLMs by focusing on alignment training, where LLMs are fine-tuned to generate human-preferred text. Our key finding is that aligned LLMs produce texts with higher estimated preferences than human-written ones, making them detectable using a reward model trained on human preference distribution.
+In ReMoDetect, we introduce two training strategies to enhance the reward model’s detection performance:
+1. **Continual preference fine-tuning**, which pushes the reward model to further prefer aligned LGTs.
+2. **Reward modeling of Human/LLM mixed texts**, where we use rephrased human-written texts as a middle ground between LGTs and human texts to improve detection.
+This approach achieves state-of-the-art results across several LLMs. For more technical details, check out our [paper](https://arxiv.org/abs/2405.17382).
+Please check the [official repository](https://github.com/hyunseoklee-ai/ReMoDetect), and [project page](https://github.com/hyunseoklee-ai/ReMoDetect) for more implementation details and updates.
+#### How to Use
+``` python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model_id = "hyunseoki/ReMoDetect-deberta"
+tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir)
+detector = AutoModelForSequenceClassification.from_pretrained(model_id)
+text = 'This text was written by a person.'
+inputs = tokenizer(text, return_tensors='pt', truncation=True,max_length=512, padding=True)
+score = detector(**inputs).logits[0]
+print(score)
+```
+### Citation
+If you find ReMoDetect-deberta  useful for your work, please cite the following papers:
+``` latex
+@misc{lee2024remodetect,
+      title={ReMoDetect: Reward Models Recognize Aligned LLM's Generations},
+      author={Hyunseok Lee and Jihoon Tack and Jinwoo Shin},
+      year={2024},
+      eprint={2405.17382},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2405.17382},
+}
+```