---
license: apache-2.0
language:
- en
base_model:
- allenai/longformer-base-4096
pipeline_tag: text-classification
metrics:
- f1
- precision
- recall
tags:
- text classification
- classifier
- nlp
- bias
- hate speech
- hate
- offensive
---
# HarmFormer

HarmFormer is a finetuned `allenai/longformer-base-4096`, which was trained to detect potentially harmful content across 5 different harm categories with three dimensions (Safe, Topical, Toxic) across long text and short text scenarios:
- H: Hate and Violence
- IH: Ideological Harm
- SE: Sexual Harm
- IL: Illegal Activities
- SI: Self-Inflicted Harm

We create and define HarmFormer to identify and detect harmful content in text data (especially web pages), which can be used for content moderation, safety checks, and other applications where understanding the nature of text's harmfulness is crucial.

More details about HarmFormer can be found in [our paper - Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs](https://arxiv.org/pdf/2505.02009).

## Model Details

- **Base Model:** allenai/longformer-base-4096
- **Number of Classes:** 5
- **Risk Levels per Class:** 3
- **Max Sequence Length:** 1024

## Usage

```python
from transformers import AutoTokenizer
from modeling import HarmFormer
import torch

# Load the model and tokenizer
model_path = "themendu/HarmFormer"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = HarmFormer.from_pretrained(model_path)

# Prepare input text
text = "Your text here"
inputs = tokenizer(
    text,
    add_special_tokens=True,
    max_length=1024,
    truncation=True,
    padding='max_length',
    return_attention_mask=True,
    return_tensors='pt',
)

# Run inference
with torch.no_grad():
    outputs = model(**inputs)
    
# Process outputs
logits = torch.stack(outputs, dim=0).permute(1, 0, 2)
probabilities = torch.softmax(logits, dim=-1)
predictions = [[[round(prob, 3) for prob in class_probs] for class_probs in sample] for sample in probabilities.cpu().tolist()]

print(predictions)
```

### Batch Processing

For processing multiple texts at once:

```python
texts = ["Text 1", "Text 2", "Text 3"]
inputs = tokenizer(
    texts,
    add_special_tokens=True,
    max_length=1024,
    truncation=True,
    padding='max_length',
    return_attention_mask=True,
    return_tensors='pt',
)

with torch.no_grad():
    outputs = model(**inputs)
    
logits = torch.stack(outputs, dim=0).permute(1, 0, 2)
probabilities = torch.softmax(logits, dim=-1)
predictions = [[[round(prob, 3) for prob in class_probs] for class_probs in sample] for sample in probabilities.cpu().tolist()]
```

## Citation

If you use this model in your research, please cite:
```
@misc{mendu2025saferpretraininganalyzingfiltering,
      title={Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs}, 
      author={Sai Krishna Mendu and Harish Yenala and Aditi Gulati and Shanu Kumar and Parag Agrawal},
      year={2025},
      eprint={2505.02009},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.02009}, 
}
```