Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -93,7 +93,7 @@ Use the code below to get started with the model.
 ```
 from transformers import AutoTokenizer, AutoModelForCausalLM
-model_id = "Uppaal/Mistral-ProFS-toxicity"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id)
@@ -108,10 +108,8 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
 ## Training (Editing) Details
 ### Data
-We use the pairwise toxicity preference dataset introduced by [Lee et al. (2024)](https://arxiv.org/abs/2401.01967).
-- Non-toxic sequences: sampled from WikiText-2.
-- Toxic counterparts: generated using the Plug-and-Play Language Model (PPLM) method to inject toxic content.
 - Data format: (toxic, non-toxic) sentence pairs.
 - Sample size: 500 pairs for ProFS editing (compared to 2,000 pairs used for DPO fine-tuning).

 ```
 from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "Uppaal/Mistral-ProFS-safety"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id)
 ## Training (Editing) Details
 ### Data
+We use [HH-Golden dataset](https://huggingface.co/datasets/nz/anthropic-hh-golden-rlhf), which manually improves the quality of noisy samples in the HH-RLHF dataset.
 - Data format: (toxic, non-toxic) sentence pairs.
 - Sample size: 500 pairs for ProFS editing (compared to 2,000 pairs used for DPO fine-tuning).