File size: 3,092 Bytes

---
license: apache-2.0
base_model: answerdotai/ModernBERT-base
library_name: transformers
pipeline_tag: text-classification
tags:
- modernbert
- text-classification
- spam-detection
- automation-detection
- long-context
- pytorch
- safetensors
language:
- en
metrics:
- f1
- precision
- recall
---

# raga - mahoraga from anime (His ability is to adapt to nature itself)

A tiny spicy ModernBERT classifier for text-risk signals - Made by @PotatoOff

> Potato did not write a README, so this appeared by magic! 

## What does it classify?

Probably text / account-behavior risk labels, inferred from the eval table:

- `transactional_spam` — spammy transactional or promo-style content
- `extractive_presence` — likely copy/extraction/presence-pattern signal
- `engagement_automation` — botty engagement / automated interaction signal
- `account_farming` — account-growth or farming behavior signal

Exact label semantics depend on the training data.

## Model

- Base: `answerdotai/ModernBERT-base`
- Type: ModernBERT sequence classifier
- Context: up to 8,192 tokens
- Best for: classification, moderation-ish filters, long text scoring

## Eval snapshot

| Label | F1 | Precision | Recall | Notes |
|---|---:|---:|---:|---|
| `transactional_spam` | 0.94 | 0.89 | 0.99 | 🟢 Excellent |
| `extractive_presence` | 0.84 | 0.73 | 0.99 | 🟢 Great recall |
| `engagement_automation` | 0.65 | 0.53 | 0.85 | 🟡 Precision weak |
| `account_farming` | 0.62 | 0.61 | 0.63 | 🟡 Hardest label |

## Install

```bash
pip install -U "transformers>=4.48.0" torch
````

Optional GPU speedup:

```bash
pip install flash-attn
```

## Inference

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "WeReCooking/raga"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else None,
    device_map="auto" if torch.cuda.is_available() else None,
    # attn_implementation="flash_attention_2",  # optional, if installed
)

text = "paste text to classify here"

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    max_length=getattr(model.config, "max_position_embeddings", 8192),
)

# ModernBERT does not need token_type_ids
inputs.pop("token_type_ids", None)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    logits = model(**inputs).logits[0].float()

id2label = {int(k): v for k, v in model.config.id2label.items()}
multi = getattr(model.config, "problem_type", None) == "multi_label_classification"

scores = torch.sigmoid(logits) if multi else torch.softmax(logits, dim=-1)

for i, score in sorted(enumerate(scores.tolist()), key=lambda x: x[1], reverse=True):
    print(f"{id2label.get(i, str(i))}: {score:.4f}")
```

## Notes

Use threshold `0.50` for multi-label as a starting point, then tune per label.
`transactional_spam` looks strong.
`engagement_automation` and `account_farming` probably need calibration before serious use.