Uppaal commited on
Commit
2754932
·
verified ·
1 Parent(s): 4b949d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -93,7 +93,7 @@ Use the code below to get started with the model.
93
 
94
  ```
95
  from transformers import AutoTokenizer, AutoModelForCausalLM
96
- model_id = "Uppaal/Mistral-ProFS-toxicity"
97
 
98
  tokenizer = AutoTokenizer.from_pretrained(model_id)
99
  model = AutoModelForCausalLM.from_pretrained(model_id)
@@ -108,10 +108,8 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
108
  ## Training (Editing) Details
109
 
110
  ### Data
111
- We use the pairwise toxicity preference dataset introduced by [Lee et al. (2024)](https://arxiv.org/abs/2401.01967).
112
 
113
- - Non-toxic sequences: sampled from WikiText-2.
114
- - Toxic counterparts: generated using the Plug-and-Play Language Model (PPLM) method to inject toxic content.
115
  - Data format: (toxic, non-toxic) sentence pairs.
116
  - Sample size: 500 pairs for ProFS editing (compared to 2,000 pairs used for DPO fine-tuning).
117
 
 
93
 
94
  ```
95
  from transformers import AutoTokenizer, AutoModelForCausalLM
96
+ model_id = "Uppaal/Mistral-ProFS-safety"
97
 
98
  tokenizer = AutoTokenizer.from_pretrained(model_id)
99
  model = AutoModelForCausalLM.from_pretrained(model_id)
 
108
  ## Training (Editing) Details
109
 
110
  ### Data
111
+ We use [HH-Golden dataset](https://huggingface.co/datasets/nz/anthropic-hh-golden-rlhf), which manually improves the quality of noisy samples in the HH-RLHF dataset.
112
 
 
 
113
  - Data format: (toxic, non-toxic) sentence pairs.
114
  - Sample size: 500 pairs for ProFS editing (compared to 2,000 pairs used for DPO fine-tuning).
115