|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- allenai/real-toxicity-prompts |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- toxic_classification |
|
|
--- |
|
|
# SegmentCNN Model for Toxic Text Classification |
|
|
|
|
|
## Overview |
|
|
|
|
|
The SegmentCNN model, a.k.a, SpanCNN, is designed for toxic text classification, distinguishing between safe and toxic content. This model is part of the research presented in the paper titled [CMD: A Framework for Context-aware Model Self-Detoxification](https://arxiv.org/abs/2308.08295). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Input**: Text data |
|
|
- **Output**: Integer |
|
|
- `0` represents **safe** content |
|
|
- `1` represents **toxic** content |
|
|
|
|
|
## Usage |
|
|
|
|
|
To use the SegmentCNN model for toxic text classification, follow the example below: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the SpanCNN model |
|
|
classifier = pipeline("spancnn-classification", model="ZetangForward/SegmentCNN", trust_remote_code=True) |
|
|
|
|
|
# Example 1: Safe text |
|
|
pos_text = "You look good today~!" |
|
|
result = classifier(pos_text) |
|
|
print(result) # Output: 0 (safe) |
|
|
|
|
|
# Example 2: Toxic text |
|
|
neg_text = "You're too stupid, you're just like a fool" |
|
|
result = classifier(neg_text) |
|
|
print(result) # Output: 1 (toxic) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find this model useful, please consider citing the original paper: |
|
|
|
|
|
```bibtex |
|
|
@article{tang2023detoxify, |
|
|
title={Detoxify language model step-by-step}, |
|
|
author={Tang, Zecheng and Zhou, Keyan and Wang, Pinzheng and Ding, Yuyang and Li, Juntao and others}, |
|
|
journal={arXiv preprint arXiv:2308.08295}, |
|
|
year={2023} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Disclaimer |
|
|
|
|
|
While the SegmentCNN model is effective in detecting toxic segments within text, we strongly recommend that users carefully review the results and exercise caution when applying this method in real-world scenarios. The model is not infallible, and its outputs should be validated in context-sensitive applications. |