Update README.md
Browse files
README.md
CHANGED
|
@@ -40,4 +40,17 @@ We also release our **HarmfulQA** dataset with 1,960 harmful questions (converti
|
|
| 40 |
|
| 41 |
<img src="https://declare-lab.net/assets/images/logos/data_gen.png" alt="Image" width="1000" height="1000">
|
| 42 |
|
| 43 |
-
_Note: This model is referred to as Starling (Blue) in the paper. We shall soon release Starling (Blue-Red) which was trained on harmful data using an objective function that helps the model learn from the red (harmful) response data._
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
<img src="https://declare-lab.net/assets/images/logos/data_gen.png" alt="Image" width="1000" height="1000">
|
| 42 |
|
| 43 |
+
_Note: This model is referred to as Starling (Blue) in the paper. We shall soon release Starling (Blue-Red) which was trained on harmful data using an objective function that helps the model learn from the red (harmful) response data._
|
| 44 |
+
|
| 45 |
+
## Citation
|
| 46 |
+
|
| 47 |
+
```bibtex
|
| 48 |
+
@misc{bhardwaj2023redteaming,
|
| 49 |
+
title={Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment},
|
| 50 |
+
author={Rishabh Bhardwaj and Soujanya Poria},
|
| 51 |
+
year={2023},
|
| 52 |
+
eprint={2308.09662},
|
| 53 |
+
archivePrefix={arXiv},
|
| 54 |
+
primaryClass={cs.CL}
|
| 55 |
+
}
|
| 56 |
+
```
|